Sequence analysis ScaffMatch: scaffolding algorithm based on maximum weight matching

نویسندگان

  • Igor Mandric
  • Alex Zelikovsky
  • John Hancock
چکیده

Motivation: Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding is a challenging task. Current scaffolding software packages widely vary in their quality and are highly dependent on the read data quality and genome complexity. There are no clear winners and multiple opportunities for further improvements of the tools still exist. Results: This article presents an efficient scaffolding algorithm ScaffMatch that is able to handle reads with both short (<600 bp) and long (>35 000 bp) insert sizes producing high-quality scaffolds. We evaluate our scaffolding tool with the F score and other metrics (N50, corrected N50) on eight datasets comparing it with the most available packages. Our experiments show that ScaffMatch is the tool of preference for the most datasets. Availability and implementation: The source code is available at http://alan.cs.gsu.edu/NGS/ ?q1⁄4content/scaffmatch. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ScaffMatch: Scaffolding Algorithm Based on Maximum Weight Matching

MOTIVATION Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding ...

متن کامل

Measurement of Left Ventricular Myocardium Wall Instantaneous Motions with Echocardiographic Sequence Images

Background & Aims: One of the important aims of quantitative cardiac image processing is the clarification of myocardial motions in order to derive biomechanical behavior of the heart in the disease condition. In this study we presented a computerized analysis method for detecting the instantaneous myocardial changes by using 2D echocardiography images. Methods: The analysis was performed on th...

متن کامل

On the inverse maximum perfect matching problem under the bottleneck-type Hamming distance

Given an undirected network G(V,A,c) and a perfect matching M of G, the inverse maximum perfect matching problem consists of modifying minimally the elements of c so that M becomes a maximum perfect matching with respect to the modified vector. In this article, we consider the inverse problem when the modifications are measured by the weighted bottleneck-type Hamming distance. We propose an alg...

متن کامل

Maintaining Approximate Maximum Weighted Matching in Fully Dynamic Graphs

We present a fully dynamic algorithm for maintaining approximate maximum weight matching in general weighted graphs. The algorithm maintains a matchingM whose weight is at least 1 8M ∗ where M∗ is the weight of the maximum weight matching. The algorithm achieves an expected amortized O(logn log C) time per edge insertion or deletion, where C is the ratio of the weights of the highest weight edg...

متن کامل

Extraction of the Longitudinal Movement of the Carotid Artery Wall using Consecutive Ultrasonic Images: a Block Matching Algorithm

Introduction: In this study, a computer analysis method based on a block matching algorithm is presented to extract the longitudinal movement of the carotid artery wall using consecutive ultrasonic images. A window (block) is selected as the reference block in the first frame and the most similar block to the reference one is found in the subsequent frames. Material and Methods: The program was...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015